Parallel Performance Optimizations on Unstructured Mesh-based Simulations
نویسندگان
چکیده
This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intraand inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both interand intra-node data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches. We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.
منابع مشابه
Optimizing CAD and Mesh Generation Workflow for SeisSol
SeisSol is a simulation software for seismic wave propagation and earthquake scenarios. It solves the fully elastic wave equations in heterogeneous media. Incorporating dynamic rupture simulation it performs complex multiphysics earthquake simulations. To account for complicated geometries SeisSol uses a fully unstructured tetrahedral mesh. Recent publications [1], [2] have shown that SeisSol i...
متن کاملJSweep: A Patch-centric Data-driven Approach for Parallel Sweeps on Large-scale Meshes
In mesh-based numerical simulations, sweep is an important computation pattern. During sweeping a mesh, computations on cells are strictly ordered by data dependencies in given directions. Due to such a serial order, parallelizing sweep is challenging, especially for unstructured and deforming structured meshes. Meanwhile, recent high-fidelity multi-physics simulations of particle transport, in...
متن کاملPerformance Analysis and Optimization of the OP2 Framework on Many-Core Architectures
This paper presents a benchmarking, performance analysis and optimization study of the OP2 ‘active’ library, which provides an abstraction framework for the parallel execution of unstructured mesh applications. OP2 aims to decouple the scientific specification of the application from its parallel implementation, and thereby achieve code longevity and near-optimal performance through re-targetin...
متن کاملCompiler Optimizations for Industrial Unstructured Mesh CFD Applications on GPUs
Graphical Processing Units (GPUs) have shown acceleration factors over multicores for structured mesh-based Computational Fluid Dynamics (CFD). However, the value remains unclear for dynamic and irregular applications. Our motivating example is HYDRA, an unstructured mesh application used in production at Rolls-Royce for the simulation of turbomachinery components of jet engines. We describe th...
متن کاملSustained Petascale Performance of Seismic Simulations with SeisSol on SuperMUC
Seismic simulations in realistic 3D Earth models require petaor even exascale computing power to capture small-scale features of high relevance for scientific and industrial applications. In this paper, we present optimizations of SeisSol – a seismic wave propagation solver based on the Arbitrary high-order accurate DERivative Discontinuous Galerkin (ADER-DG) method on fully adaptive, unstructu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015